AITopics | xobject subtype image bitspercomponent 8

Collaborating Authors

xobject subtype image bitspercomponent 8

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hybrid Imitation-Learning Motion Planner for Urban Driving

Gariboldi, Cristian, Corno, Matteo, Jin, Beng

arXiv.org Artificial IntelligenceSep-4-2024

With the release of open source datasets such as nuPlan and Argoverse, the research around learning-based planners has spread a lot in the last years. Existing systems have shown excellent capabilities in imitating the human driver behaviour, but they struggle to guarantee safe closed-loop driving. Conversely, optimization-based planners offer greater security in short-term planning scenarios. To confront this challenge, in this paper we propose a novel hybrid motion planner that integrates both learning-based and optimization-based techniques. Initially, a multilayer perceptron (MLP) generates a human-like trajectory, which is then refined by an optimization-based component. This component not only minimizes tracking errors but also computes a trajectory that is both kinematically feasible and collision-free with obstacles and road boundaries. Our model effectively balances safety and human-likeness, mitigating the trade-off inherent in these objectives. We validate our approach through simulation experiments and further demonstrate its efficacy by deploying it in real-world self-driving vehicles.

hybrid imitation-learning motion planner, urban driving, xobject subtype image bitspercomponent 8

arXiv.org Artificial Intelligence

2409.02871

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.53)

Add feedback

Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models

Ding, Ning, Chen, Yulin, Cui, Ganqu, Lv, Xingtai, Zhao, Weilin, Xie, Ruobing, Zhou, Bowen, Liu, Zhiyuan, Sun, Maosong

arXiv.org Artificial IntelligenceMar-26-2024

Underlying data distributions of natural language, programming code, and mathematical symbols vary vastly, presenting a complex challenge for large language models (LLMs) that strive to achieve high performance across all three domains simultaneously. Achieving a very high level of proficiency for an LLM within a specific domain often requires extensive training with relevant corpora, which is typically accompanied by a sacrifice in performance in other domains. In this paper, we propose to fuse models that are already highly-specialized directly. The proposed fusing framework, UltraFuser, consists of three distinct specialists that are already sufficiently trained on language, coding, and mathematics. A token-level gating mechanism is introduced to blend the specialists' outputs. A two-stage training strategy accompanied by balanced sampling is designed to ensure stability. To effectively train the fused model, we further construct a high-quality supervised instruction tuning dataset, UltraChat 2, which includes text, code, and mathematical content. This dataset comprises approximately 300,000 instructions and covers a wide range of topics in each domain. Experiments show that our model could simultaneously achieve mastery of the three crucial domains.

devicegray filter flatedecode height 512, interpolate true length 12788, xobject subtype image bitspercomponent 8, (8 more...)

arXiv.org Artificial Intelligence

2403.08281

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)

Add feedback

Disentangling representations of retinal images with generative models

Müller, Sarah, Koch, Lisa M., Lensch, Hendrik P. A., Berens, Philipp

arXiv.org Artificial IntelligenceFeb-29-2024

Retinal fundus images play a crucial role in the early detection of eye diseases and, using deep learning approaches, recent studies have even demonstrated their potential for detecting cardiovascular risk factors and neurological disorders. However, the impact of technical factors on these images can pose challenges for reliable AI applications in ophthalmology. For example, large fundus cohorts are often confounded by factors like camera type, image quality or illumination level, bearing the risk of learning shortcuts rather than the causal relationships behind the image generation process. Here, we introduce a novel population model for retinal fundus images that effectively disentangles patient attributes from camera effects, thus enabling controllable and highly realistic image generation. To achieve this, we propose a novel disentanglement loss based on distance correlation. Through qualitative and quantitative analyses, we demonstrate the effectiveness of this novel loss function in disentangling the learned subspaces. Our results show that our model provides a new perspective on the complex relationship between patient attributes and technical confounders in retinal fundus image generation.

disentangling representation, generative model, xobject subtype image bitspercomponent 8, (8 more...)

arXiv.org Artificial Intelligence

2402.19186

Genre: Research Report > New Finding (0.53)

Industry: Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Add feedback

Disentangled 3D Scene Generation with Layout Learning

Epstein, Dave, Poole, Ben, Mildenhall, Ben, Efros, Alexei A., Holynski, Aleksander

arXiv.org Artificial IntelligenceFeb-26-2024

We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jointly optimizes multiple NeRFs from scratch - each representing its own object - along with a set of layouts that composite these objects into scenes. We then encourage these composited scenes to be in-distribution according to the image generator. We show that despite its simplicity, our approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation. For results and an interactive demo, see our project page at https://dave.ml/layoutlearning/

bitspercomponent 8, layout learning, scene generation, (6 more...)

arXiv.org Artificial Intelligence

2402.16936

Country: Europe > United Kingdom (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.33)

Add feedback

SINR-Aware Deep Reinforcement Learning for Distributed Dynamic Channel Allocation in Cognitive Interference Networks

Cohen, Yaniv, Gafni, Tomer, Greenberg, Ronen, Cohen, Kobi

arXiv.org Artificial IntelligenceFeb-17-2024

We consider the problem of dynamic channel allocation (DCA) in cognitive communication networks with the goal of maximizing a global signal-to-interference-plus-noise ratio (SINR) measure under a specified target quality of service (QoS)-SINR for each network. The shared bandwidth is partitioned into K channels with frequency separation. In contrast to the majority of existing studies that assume perfect orthogonality or a one- to-one user-channel allocation mapping, this paper focuses on real-world systems experiencing inter-carrier interference (ICI) and channel reuse by multiple large-scale networks. This realistic scenario significantly increases the problem dimension, rendering existing algorithms inefficient. We propose a novel multi-agent reinforcement learning (RL) framework for distributed DCA, named Channel Allocation RL To Overlapped Networks (CARLTON). The CARLTON framework is based on the Centralized Training with Decentralized Execution (CTDE) paradigm, utilizing the DeepMellow value-based RL algorithm. To ensure robust performance in the interference-laden environment we address, CARLTON employs a low-dimensional representation of observations, generating a QoS-type measure while maximizing a global SINR measure and ensuring the target QoS-SINR for each network. Our results demonstrate exceptional performance and robust generalization, showcasing superior efficiency compared to alternative state-of-the-art methods, while achieving a marginally diminished performance relative to a fully centralized approach.

cognitive interference network, dynamic channel allocation, sinr-aware deep reinforcement learning, (2 more...)

arXiv.org Artificial Intelligence

2402.17773

Genre: Research Report > New Finding (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Physics-Informed Deep Reversible Regression Model for Temperature Field Reconstruction of Heat-Source Systems

Gong, Zhiqiang, Zhou, Weien, Zhang, Jun, Peng, Wei, Yao, Wen

arXiv.org Artificial IntelligenceJul-4-2021

Temperature monitoring during the life time of heat source components in engineering systems becomes essential to guarantee the normal work and the working life of these components. However, prior methods, which mainly use the interpolate estimation to reconstruct the temperature field from limited monitoring points, require large amounts of temperature tensors for an accurate estimation. This may decrease the availability and reliability of the system and sharply increase the monitoring cost. To solve this problem, this work develops a novel physics-informed deep reversible regression models for temperature field reconstruction of heat-source systems (TFR-HSS), which can better reconstruct the temperature field with limited monitoring points unsupervisedly. First, we define the TFR-HSS task mathematically, and numerically model the task, and hence transform the task as an image-to-image regression problem. Then this work develops the deep reversible regression model which can better learn the physical information, especially over the boundary. Finally, considering the physical characteristics of heat conduction as well as the boundary conditions, this work proposes the physics-informed reconstruction loss including four training losses and jointly learns the deep surrogate model with these losses unsupervisedly. Experimental studies have conducted over typical two-dimensional heat-source systems to demonstrate the effectiveness of the proposed method.

devicegray filter flatedecode height 1141, pgf, xobject subtype image bitspercomponent 8, (9 more...)

arXiv.org Artificial Intelligence

2106.11929

Genre: Research Report (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.80)

Add feedback

An Efficient Cervical Whole Slide Image Analysis Framework Based on Multi-scale Semantic and Spatial Deep Features

Wei, Ziquan, Cheng, Shenghua, Liu, Xiuli, Zeng, Shaoqun

arXiv.org Artificial IntelligenceJul-2-2021

Digital gigapixel whole slide image (WSI) is widely used in clinical diagnosis, and automated WSI analysis is key for computer-aided diagnosis. Currently, analyzing the integrated descriptor of probabilities or feature maps from massive local patches encoded by ResNet classifier is the main manner for WSI-level prediction. Feature representations of the sparse and tiny lesion cells in cervical slides, however, are still challengeable for the under-promoted upstream encoders, while the unused spatial representations of cervical cells are the available features to supply the semantics analysis. As well as patches sampling with overlap and repetitive processing incur the inefficiency and the unpredictable side effect. This study designs a novel inline connection network (InCNet) by enriching the multi-scale connectivity to build the lightweight model named You Only Look Cytopathology Once (YOLCO) with the additional supervision of spatial information. The proposed model allows the input size enlarged to megapixel that can stitch the WSI without any overlap by the average repeats decreased from $10^3\sim10^4$ to $10^1\sim10^2$ for collecting features and predictions at two scales. Based on Transformer for classifying the integrated multi-scale multi-task features, the experimental results appear $0.872$ AUC score better and $2.51\times$ faster than the best conventional method in WSI classification on multicohort datasets of 2,019 slides from four scanning devices.

group type group cs devicergb, semantic and spatial deep feature, slide image analysis framework, (6 more...)

arXiv.org Artificial Intelligence

2106.15113

Country:

Europe > United Kingdom (0.04)
North America > United States > Ohio (0.04)

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.53)

Add feedback

Monte Carlo Variational Auto-Encoders

Thin, Achille, Kotelevskii, Nikita, Doucet, Arnaud, Durmus, Alain, Moulines, Eric, Panov, Maxim

arXiv.org Machine LearningJun-30-2021

Variational auto-encoders (VAE) are popular deep latent variable models which are trained by maximizing an Evidence Lower Bound (ELBO). To obtain tighter ELBO and hence better variational approximations, it has been proposed to use importance sampling to get a lower variance estimate of the evidence. However, importance sampling is known to perform poorly in high dimensions. While it has been suggested many times in the literature to use more sophisticated algorithms such as Annealed Importance Sampling (AIS) and its Sequential Importance Sampling (SIS) extensions, the potential benefits brought by these advanced techniques have never been realized for VAE: the AIS estimate cannot be easily differentiated, while SIS requires the specification of carefully chosen backward Markov kernels. In this paper, we address both issues and demonstrate the performance of the resulting Monte Carlo VAEs on a variety of applications.

dmf, monte carlo variational auto-encoder, xobject subtype image bitspercomponent 8, (2 more...)

arXiv.org Machine Learning

2106.15921

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback

Single Image Texture Translation for Data Augmentation

Li, Boyi, Cui, Yin, Lin, Tsung-Yi, Belongie, Serge

arXiv.org Artificial IntelligenceJun-25-2021

Recent advances in image synthesis enables one to translate images by learning the mapping between a source domain and a target domain. Existing methods tend to learn the distributions by training a model on a variety of datasets, with results evaluated largely in a subjective manner. Relatively few works in this area, however, study the potential use of semantic image translation methods for image recognition tasks. In this paper, we explore the use of Single Image Texture Translation (SITT) for data augmentation. We first propose a lightweight model for translating texture to images based on a single input of source texture, allowing for fast training and testing. Based on SITT, we then explore the use of augmented data in long-tailed and few-shot image classification tasks. We find the proposed method is capable of translating input data into a target domain, leading to consistent improved image recognition performance. Finally, we examine how SITT and related image translation methods can provide a basis for a data-efficient, augmentation engineering approach to model training.

data augmentation, single image texture translation, xobject subtype image bitspercomponent 8, (2 more...)

arXiv.org Artificial Intelligence

2106.13804

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.73)

Add feedback

Low-rank Characteristic Tensor Density Estimation Part II: Compression and Latent Density Estimation

Amiridi, Magda, Kargas, Nikos, Sidiropoulos, Nicholas D.

arXiv.org Machine LearningJun-19-2021

Learning generative probabilistic models is a core problem in machine learning, which presents significant challenges due to the curse of dimensionality. This paper proposes a joint dimensionality reduction and non-parametric density estimation framework, using a novel estimator that can explicitly capture the underlying distribution of appropriate reduced-dimension representations of the input data. The idea is to jointly design a nonlinear dimensionality reducing auto-encoder to model the training data in terms of a parsimonious set of latent random variables, and learn a canonical low-rank tensor model of the joint distribution of the latent variables in the Fourier domain. The proposed latent density model is non-parametric and universal, as opposed to the predefined prior that is assumed in variational auto-encoders. Joint optimization of the auto-encoder and the latent density estimator is pursued via a formulation which learns both by minimizing a combination of the negative log-likelihood in the latent domain and the auto-encoder reconstruction loss. We demonstrate that the proposed model achieves very promising results on toy, tabular, and image datasets on regression tasks, sampling, and anomaly detection.

colorspace devicergb decodeparm color 3, filter flatedecode height 37, xobject subtype image bitspercomponent 8, (5 more...)

arXiv.org Machine Learning

2106.10591

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback